Michael Wörner, GSaME –
Universität Stuttgart, Michael.Woerner@gsame.uni-stuttgart.de
Harald Bosch, VIS – Universität Stuttgart
Steffen Koch,VIS – Universität Stuttgart
We customized tools based on previous
developments of our department, and adapted them to fit the requirements of the
challenge. We integrated them into one application along with newly built
tools.
These tools comprise: a fuzzy logic rule
evaluator to analytically determine which entities cannot be of a certain role
type, a table view of the current candidate entities for a given role, a
hypothesis graph view, a graph view to display the resulting networks, and a
map display to investigate the (inter)national connections of candidate
syndicates.
For the development, we mainly used the Java
SDK, Apache libraries, and the prefuse visualization toolkit. Additionally,
Microsoft Excel was used for some tasks.
Video:
Video.avi (DivX encoded)
ANSWERS:
MC2.1: Which of the
two social structures, A or B, most closely match the scenario you have
identified in the data?
A
MC2.2: Provide
the social network structure you have identified as a tab delimitated file. It
should contain the employee, one or more handler, any middle folks, and the
localized leader with their international contacts. What are the Flitter names
of the persons involved? Please identify only key connections (not all single
links for example) as well as any other nodes related to the scenario (if any)
you may have discovered that were not described in the two scenarios A and B
above.
MC2.3: Characterize the difference between your social network and the closest social structure you selected (A or B). If you include extra nodes please explain how they fit in to your scenario or analysis.
In order to find the
structures outlined by the scenario descriptions in the provided data set, we
created a tool, partly reusing technology created in previous projects. It
restricts sets of entities (Flitter contacts in this case) based on the number
of contacts they have, optionally taking into account the role of those
contacts and supporting fuzzy rule definitions ("about 40 contacts").
Over the course of a few days, we adapted our tools for the challenge and
complemented them with new ones. We defined appropriate rules formalizing the
scenario descriptions. These rules include “an employee knows roughly 40
contacts” or “a handler knows at least 1 middle man that at least 2 other
handlers know”, for example. The creation of these rules took only a few
minutes.
Figure:
the available rules
We imported the
provided data set into our tool and started with the initial hypothesis that
every person is a candidate for every role. This results in 6000 candidates for
each of the four primary roles ‘employee’, ‘handler’, ‘middle man’, and
‘leader’. By attaching rules to our starting hypothesis node through
drag&drop interaction, we created derived hypotheses, thereby continuously
reducing the sets of candidates by eliminating those that do not meet the rules
for the given scenario. The subsequently applied rules comprise “an employee
knows roughly 40 contacts”, “a handler knows at least 1 employee”, “a handler
knows roughly 30-40 contacts”, “a middle man knows at least 3 handlers”, “a
leader knows at least 1 middle man”, “a leader knows at least about 125
contacts”, and ”an employee knows at least 3 handlers”. All of these rules were
derived directly from the description of scenario A. By requiring candidates to
know “at least” as many contacts as specified by the scenario, we were able to
exclude those who know less and therefore most certainly do not fulfill the
role requirements. For example, Flitter users with 20, 50, or even 80 contacts
can be removed from the set of possible ‘leaders’, as they do not meet the
"well over 100 contacts” rule. Many of the rules contain only approximate
values, so we assigned confidence values to the candidates, based on how well
they meet a requirement. Because we were looking for a structure that not
necessarily satisfies all of the scenario rules perfectly, we give the analyst
the option to assign a weight value to rules, determining the impact of a
single rule on the confidence calculation. The result is shown in the figure
below.
Figure:
The application of “safe rules” reduces the possible roles for many entities.
Starting from 6000 for each role we get 116 / 298 / 775 / 22 candidates for
employees (red) / handler (green) / middlemen (cyan) / leader (magenta). The
rule set on the right shows available (brighter) and already used (darker)
rules. Dropping a rule next to a node creates a circular slider to adjust the
weight of the rule and a node representing the resulting candidate set.
Continuing from this
point, we considered that a ‘middleman’ knows three ‘handlers’, the ‘leader’,
at most one other member of the criminal organization, and no one else. As
confirmed by a blog answer, "no one else" refers to the entire
Flitter network, so we added the rule “a middleman knows roughly 4-5 contacts”.
Noticing that the substantial reduction of ‘middlemen’ did not affect the
number of ‘handler’ candidates, we added the rule “a handler knows at least 1
middle man”, which left us with a set small enough to be visualized as a graph.
Figure:
Possible criminal networks can be displayed by hovering over an entity. This
network is incomplete because the ‘middleman’ (cyan) is not in contact with any
‘leader’ candidate.
The entities are laid
out according to their roles. Entities which are candidates for more than one
role are represented by multiple visual items. Pointing at an entity highlights
its links to the adjacent layers and uncovers that there are ‘middlemen’
without any connection to a ‘leader’ candidate. Adding this last rule ("a
middle man knows at least one leader") results in the two possible
networks shown below.
Figure:
Two possible criminal networks. The highlighted one does not fully comply with
the description because two of the handlers know each other.
In one of these
networks, two of the ‘handlers’ (@bailey and @letelier) know each other, which
contradicts the scenario description. Right clicking @bailey and explicitly
stating that he or she "is not a handler" removes the respective
network from the hypothesis (by means of rules that are still in effect but can
no longer be met if @bailey is no ‘handler’). This leaves only one network with
@shaffter as the ‘employee’ and @szemeredi as the ‘leader’. This result
fulfills every aspect of the description of scenario A perfectly and is our
primary solution.
The international
contacts of the persons involved in the network can easily be investigated by
brushing over the candidate network view. All highlights are directly reflected
in the linked map display.
After identifying the
network, we checked the contacts table of the ‘middleman’ @good, who is said to
have only members of the criminal organization on his Flitter contact list. This reveals @moilanen as a possible additional member of the
organization.
We then considered
the slightly different description of scenario B, which states that “each of
the middle men probably communicates with one or two others in the
organization, and no one else”. In this context, this translates to “a
middleman has 2-3 contacts” (one handler plus 1-2 others). However, as the
minimum number of contacts for any user is 4, this rule would eliminate all
‘middleman’ candidates. Applying other “safe” rules reduces the candidate set
to 3 names that have only 4 contacts, but assuming any of them as a ‘middleman’
does not result in the expected network structure. Strictly applying the other
criteria excludes all potential result networks that have the correct number of
contacts for ‘employees’ and ‘handlers’.
MC2.4: How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one?
The ‘employee’ and
the three ‘handlers’ identified by the analysis in 2.3 are from the same city,
Prounov. The ‘middleman’ resides a certain distance away, in Kannvic, Flovania,
north west of Prounov. ‘Fearless Leader’ himself is located further away north,
in Kouvnic. This supports the social structure from Part 1: The ‘handlers’ stay
in close contact with the employee. The ‘middleman’ keeps a distance and
reports to the leader who resides even further away. We created a rule to check
this fact (“an employee knows at least 3 handlers from the same city”) and it
eliminates many candidate ‘employees’ that already had reduced confidence
values due to violating other fuzzy constraints (“roughly 40 contacts”).
Figure:
Applying the above mentioned rule at this stage reduces the possible employees
from 144 to 21 despite the low confidence penalty of 0.47.
Again,
this structure can be easily validated using the interactive map view.
MC2.5: In general, how are the Flitter users dispersed throughout the cities of this challenge? Which of the surrounding countries may have ties to this criminal operation? Why might some be of more significant concern than others?
The dispersion of
Flitter users can easily be calculated by first exporting the contact tables
containing the location and role confidence of each contact from our tool to MS
Excel. Afterwards a Pivot Table can be used to aggregate the values. As we can
display a list of the contacts for every set of entities we can reuse the Pivot
Table to get the same information for only the contacts of the criminal
organization by exporting the contact lists to Excel. We can see that
especially the leader has more contacts to Posana than the average.
Overall dispersion:
City |
Flitter users |
% |
Koul |
1998 |
33,30 |
Prounov |
1707 |
28,45 |
Kouvnic |
798 |
13,30 |
Kannvic |
320 |
5,33 |
Solvenz |
210 |
3,50 |
Pasko |
147 |
2,45 |
Otello |
147 |
2,45 |
Sresk |
147 |
2,45 |
Ryzkland |
142 |
2,37 |
Solank |
135 |
2,25 |
Transpasko |
126 |
2,10 |
Tulamuk |
123 |
2,05 |
Origins of the
contacts of the criminal network:
Country |
Contacts |
% |
% of all users |
Posana |
9 |
2,62% |
2,45% |
Flovania |
321 |
93,59% |
93,40% |
Transak |
6 |
1,75% |
2,10% |
Trium |
7 |
2,04% |
2,05% |